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Abstract 

Genome-wide association studies have successfully identified many common variants associated with complex 
human diseases. However, a large portion of the remaining heritability cannot be explained by these common 
variants. Exploring rare variants associated with diseases is now catching more attention. Several methods have 
been recently proposed for identification of rare variants. Among them, the fixed-threshold, weighted-sum, and 
variable-threshold methods are effective in combining the information of multiple variants into a functional unit; 
these approaches are commonly used. We evaluate the performance of these three methods. Based on our 
analyses of the Genetic Analysis Workshop 17 data, we find that no method is universally better than the others. 
Furthermore, adjusting for potential covariates can not only increase the true-positive proportions but also reduce 
the false-positive proportions. Our study concludes that there is no uniformly most powerful test among the three 
methods we compared (the fixed-threshold, weighted-sum, and variable-threshold methods), and their 
performances depend on the underlying genetic architecture of a disease. 



Background 

In the past several years, genome-wide association stu- 
dies (GWAS) have successfully identified many common 
single-nucleotide polymorphisms (SNP) (say, minor 
allele frequency [MAF] > 5%) associated with complex 
human diseases. Despite the findings from GWAS, a 
large portion of the remaining heritability cannot be 
explained by these common variants [1]. The impor- 
tance of detecting rare variants has thus been recog- 
nized. However, exploring rare variants that are 
associated with diseases is challenging because of their 
low frequencies and individually small contributions to 
the susceptibility to a disease [2]. Recently, several meth- 
ods have been proposed for detecting rare variants (for 
an overview see Dering et al. [3]). Most of these meth- 
ods pool signals of multiple rare variants into a func- 
tional unit, such as a candidate gene, and then test the 
association between the pooled signal and the disease 
[4-7]. For these methods, the choice of a threshold to 



* Correspondence: nliu@uab.edu 

department of Biostatistics, University of Alabama at Birmingham, 1665 

University Boulevard, Birmingham, AL 35294, USA 

Full list of author information is available at the end of the article 

(3 BioMed Central 



discriminate rare variants from common variants plays 
an important role. If the threshold is too high, variants 
with relatively high MAFs will dominate the results of 
association tests for the genes. On the other hand, if the 
threshold is too low, the statistical power of the associa- 
tion tests will tend to become unnecessarily low. The 
specification of a threshold is crucial to the performance 
of a pooled association test. In this paper, we evaluate 
the performance of several methods using the simulated 
data of unrelated individuals from Genetic Analysis 
Workshop 17 (GAW17) [8]. 

Methods 

Three analysis methods 

Some of the proposed pooling methods first specify a 
fixed threshold for the MAF and then perform associa- 
tion tests on the set of variants with MAFs smaller than 
that threshold [4,6]. The weighted-sum method [5] 
extends this idea and weights each variant by the inverse 
square root of the expected variance based on the allele 
frequencies. The larger the MAF, the smaller the weight 
given to that variant. However, this weighting scheme 
restricts the effect of a functional variant to be 
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statistically related to its allele frequency, which may not 
be plausible in some situations. To address the issue of 
a preset threshold for the MAF, Price et al [7] proposed 
a variable-threshold (VT) approach. The VT approach 
groups rare variants together by searching for an opti- 
mal threshold that maximizes the difference between 
trait distributions for subjects with and without the rare 
variants. Using the data from GAW17, we compare the 
performance of the VT method with the performance of 
the weighted-sum (WS) method [5] and the fixed- 
threshold method of Morris and Zeggini [6] with thresh- 
olds of 1% and 5% (denoted Tl and T5, respectively). 

Data 

We use the GAW17 data for 697 unrelated individuals 
with variants on 22 autosomal chromosomes. There are 
24,487 SNPs located in 3,205 genes on these chromo- 
somes. We use the start and end positions (in base pairs) 
of each gene to pick the SNPs falling within the bound- 
aries of that gene. We analyze all the phenotypes avail- 
able: Ql, Q2, Q4, and the binary trait (Affected). All the 
200 simulated replicates were studied. To evaluate the 
performance of the four tests (Tl, T5, WS, and VT), we 
requested the simulation answers and compared the 
answers with the results obtained from the four tests. 

Variable threshold software 

All the analyses were performed using the VT test soft- 
ware (http://genetics.bwh.harvard.edu/rare_variants/) of 
Price et al. [7]. The VT software performs Tl, T5, WS, 
and VT tests in each analysis. The statistical models are 
all based on linear regressions: 
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where y is the phenotype, x t is the number of the ith 
rare variant in gene- G, w t is .the weight given to the ith. 
rare variant, and ^° and P are the estimates of the 
regression coefficients. For the Tl (or T5) test, w t equals 
1 if the frequency of the ith variant is less than 1% (or 
5%) and 0 otherwise. For the WS test, 
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where p t is the allele frequency of the ith variant. 
For the VT test, a z-score: 



z(T) = - 



P(T) 



se[/?(tj] 
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is computed for each allele frequency threshold T, 
where SE represents the standard error. Let z max be the 
maximum z score among all possible values of T; then, 
the significance of z max is assessed by permutation of 
phenotypes. Suppose that we perform p permutations 
and therefore have z max ,i> which are the 

maximum z scores obtained at their optimal thresholds 
Ti, T 2 , T p , respectively. To ensure the validity of the 
VT test, the software allows 7\, T 2 , T p for permuted 
data to be different from the optimal threshold T for 
the original data. The VT software then compares z max 
with z max ,i, £ max , 2 > z maXtP to determine its statistical 
significance. 

When we performed the Tl, T5, WS, and VT tests 
with the VT software, each j?-value was calculated based 
on 100,000 permutations. To increase the computation 
speed, the VT test uses linear regression instead of 
logistic regression to analyze all phenotypes, including 
the binary trait. To understand the influence of adjust- 
ment for covariates, we compared the results when 
ignoring all covariates with the results obtained when 
adjusting for Age and Smoking status. We first obtained 
the residuals by regressing the phenotypes on Age and 
Smoking status, and then the residuals were regarded as 
the adjusted phenotypes and were analyzed by the VT 
software. 

Results 

Type I error rates 

The phenotype Q4 is not related to any of the 3,205 
genes, so we use this part of the results to evaluate type 
I error rates. Given a significance level a, we estimate 
the type I error rate using: 
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where /(♦) is the indicator function, p i>r is the p-vdlue 
of the ith gene in the rth replicate, T g is the set formed 
by all genes, and | T g \ is the number of genes in T g . 
Note that the set T g varies with different methods. 
Because the Tl test considers only genes that have at 
least one SNP with MAF less than 1%, the total number 
of genes considered by the Tl test is 2,485. Similarly, 
the T5 test considers only genes that include at least 
one SNP with MAF less than 5%, and the total number 
of genes considered by the T5 test is 2,881. The WS 
and VT tests are performed for all genes without pre- 
specifying a threshold, so the total number of genes 
considered by both of these tests is 3,205. 
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Table 1 Type I error rates (results based on analyzing 
Q4) 
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a = 0.05 


0.04575 
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a= 0.1 
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With adjustment for Age and 
Smoking status 
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0.00001 


0.00002 


0.00001 


0.00001 


a = 0.001 


0.00096 


0.00094 


0.00095 


0.00095 


a = 0.005 


0.00473 


0.00491 


0.00485 


0.00485 


a = 0.01 


0.00956 


0.00973 


0.00966 


0.00972 


a = 0.05 


0.04918 


0.05040 


0.05037 


0.04990 


a= 0.1 


0.09906 


0.10148 


0.10159 
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Table 1 shows the type I error rates for the four tests. 
When we ignore all the covariates, the type I error rates 
are generally inflated for the WS and VT tests and 
slightly inflated for the T5 test. After adjusting for Age 
and Smoking status, this inflation of type I error rates 
disappears. We further discovered that the inflation of 
type I error rates disappeared so long as Age was 
adjusted, but it remained if only Smoking status was 
adjusted. To verify this, we deliberately let Age be the 
outcome variable and tested its association with genes. 
We found that the rates of rejection of no association 
(between genes and Age) were generally larger than the 
nominal significance levels (when the significance level 
was set at 5%, the average rejection rates were 7.6%, 
11.9%, 11.0%, and 12.4% for the Tl, T5, WS, and VT 
tests, respectively). However, when we let Smoking sta- 
tus be the outcome variable and tested its association 
with genes, the average rejection rates matched the 
nominal significance levels. This suggests that the 
observed inflation of type I error rates comes from 
some latent confounders (e.g., population stratification 
or preferential death of carriers with some particular 
genotypes), and we can remove the false-positive find- 
ings by adjusting for Age. 

ROC curves 

The phenotypes Ql, Q2, and the binary trait (Affected) 
are related to some genes, so we use the results on 
these three phenotypes to evaluate the true-positive pro- 
portions and the false-positive proportions. Figure 1 pre- 
sents the receiver operating characteristic (ROC) curves 
of the four tests. Given a significance level a, we esti- 
mate the true-positive proportion using: 
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and the false-positive proportion using: 
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where S g is the set formed by disease genes, |5^| is the 
number of genes in S g , and {T g \S g } is the set formed by 
genes unrelated to the disease. The set of S g is provided 
by the underlying simulation model. 

From the results, all methods perform better for con- 
tinuous traits than for the binary trait. For continuous 
traits, these methods perform better for Ql than for Q2. 
This is reasonable because Ql has higher residual herit- 
ability. For phenotypes Ql and Affected, the areas under 
the ROC curves (i.e., the AUC) increased when the phe- 
notypes were adjusted for Age and Smoking status. 
However, adjusting for these two covariates did not 
have any influence on the ROC curves for Q2. This is 
also reasonable because Q2 is not influenced by any 
covariate, according to the underlying simulation model. 

Discussion 

In this study, we evaluated the performance of three 
methods (four tests)— fixed-threshold, weighted-sum, 
and variable-threshold methods— for detecting rare var- 
iants. The main difference between these methods is the 
selection of a threshold to discriminate rare from com- 
mon variants. Based on the simulation model for Ql, 
most true signals are variants with MAF < 5%, so the 
T5 test is the best method. The only two exceptions are 
C13S523 (MAF = 6.67%) in gene FLT1 and C4S1878 
(MAF = 16.50%) in gene KDR. However, the powers of 
the four tests are all high for detecting FLT1 and KDR, 
because the two genes include many functional variants. 
Excluding C13S523 from FLT1 or excluding C4S1878 
from KDR makes no difference to the final results. The 
VT test is inferior to the T5 test because of the inclu- 
sion of the higher threshold (>5%), which increases 
noise and reduces power. When analyzing the pheno- 
type Q2, we found that the VT test was the most 
powerful method for detecting gene VNN1. However, 
both the Tl test and the T5 test performed poorly in 
detecting VNN1, because one of the two functional var- 
iants in VNN1 is relatively common (MAF = 17%). 
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Figure 1 ROC curves for the four tests. The top row presents the results without adjustment for any covariate; the second row presents the 
results with adjustment for Age and Smoking status. In the parentheses are the areas under the ROC curves. In the bottom row we show the 
combined ROC curves without and with adjustment for covariates. 



Therefore, when analyzing Q2, the VT test is slightly 
better than the other methods. For the binary trait 
(Affected), all tests have similar (poor) performances. 
This is because the binary trait was determined by a 
model including noise (Q4). Not surprisingly, analyzing 
this trait is more challenging than analyzing Ql or Q2. 



Based on our results, we found that inflated type I error 
rates were caused by potential confounders not adjusted 
for in the models (Table 1). If a phenotype is related to 
some covariates, adjusting for these covariates can also 
increase the true-positive proportions (Ql and Affected 
in Figure 1). However, if a phenotype is not related to 
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the covariates, adjusting for them has no influence on the 
true-positive proportions (Q2 in Figure 1). 

Conclusions 

We evaluated the performance of three methods (fixed- 
threshold, weighted-sum, and variable-threshold meth- 
ods) in pooling signals of multiple rare variants in a 
gene. Based on our analyses for the GAW17 data, we 
find that no method is universally better than the 
others. Furthermore, adjusting for potential covariates 
can not only increase the true-positive proportions but 
also reduce the false-positive proportions. Our study 
provides an overall evaluation of the three popular 
pooled association methods with the GAW17 exome 
simulation data. This can provide insights to determine 
a strategy for analyzing exome sequencing data. 
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